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ABSTRACT 



This study examined the reliability and validity of 



self-reported survey data on instructional practices. It was based on a 
nationwide survey of more than 25,000 teachers in more than 1,000 schools 
across 5 years. The survey- instrument was the Classroom Instructional 
Practice Scale (CIPS) , which was based on the Classroom Information Sheet 
developed by P. Wiesz and E. Cowen (1976). Although self-reported survey data 
might not capture the quality of the interaction between teachers and 
students, this study shows that survey data provide a fairly accurate 
description of how often teachers use various instructional practices that 
are consistent with the recommendations of several reform initiatives. There 
was consistent and solid agreement between what teachers reported and what 
students perceived in terms of their classroom activities. CIP scales were 
positively related to student achievement in mathematics. Survey results also 
suggest that grouped items, measuring the same underlying characteristics, 
provide more reliable measures of instructional practices both empirically 
and conceptually. Researchers proposed eight dimensions of quality 
instruction, and the factor structures of these dimensions were stable over 5 
years. The hypothesized model fit the data well. As policymakers focus on 
assessing instructional trends, it is not plausible to rely on in-depth 
studies of a small number of classrooms. Survey data will provide the most 
cost-effective way of measuring national trends in instruction. (Contains 7 
tables and 16 references.) (SLD) 
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Introduction 



Recent efforts for educational reform have brought our attention to changes in instructional 
practices. Educators and policymakers are interested in identifying the instructional practices that 
“work” in improving student performance (Brophy and Good, 1986). This has led to current 
enthusiasm for educational standards in several curriculum areas (National Council of Teachers of 
Mathematics, 1989; National Council of Teachers of English, 1996). To monitor the impact of such 
efforts, we need accurate data on instructional practices. Much of the data on instructional practices 
are self-reported by teachers, and traditionally of questionable quality (Burstein, McDonnell, Van 
Winkle, Ormseth, Mirocha, & Guitton, 1995). As Burstein et al. (1995) argued, little effort has been 
made to validate whether the national survey data measure the complex procedure of classroom 
instruction. This explains why many studies on instructional practices have depended on in-depth 
case studies from a handful of classrooms. It is hardly possible to generalize the findings to other 
classrooms. The limited generalizability of case studies becomes more problematic as policymakers 
need to understand the impact of reforms in our educational system. For that reason, survey data, a 
cost-effective way to include a large number of classrooms are very appealing. Few studies, 
however, have examined the validity of the self-reported data on instructional practices although 
they have often been used to determine the impact of educational reforms. Mayer (1999) called for 
more research on the issues of survey reliability and validity. The purpose of this study is to obtain 
evidences of the reliability and the validity of a self-reported survey inventory designed to assess the 
degree to which teachers implement recommended instructional practices in the classroom. 

Data and Method 

A large-scale survey was developed to examine the degree to which a broad range of 
recommendations for effective school reform are implemented in a school as well as to examine 
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more fully their impact on students and staff. Among a variety of sections which examine 



dimensions of whole school reform, the survey has one section that asks questions on the frequency 
with which each teacher uses instructional practices based on the recommendations of nationwide 
reform initiatives on middle grades (Turning Points , 1989) and national curriculum standards 
(NCTM, 1989; NCTE, 1996). Teachers reported the frequency with which they used various 
instructional practices using a 7-point scale with the following response categories: “never”, “several 
times a year”, “monthly”, “several times a month”, “weekly”, “several times a week”, and “daily”. 
The Classroom Instructional Practice Scale (CIPS) was originally developed from the classroom 
routine section of the Classroom Information Sheet (Wiesz & Cowen, 1976) as well as further items 
that were written by the authors to assess specific middle school practices. In 1992-93, extensive 
factor analyses were done with 37 middle schools in Illinois. Eight sub-scales consisting of 56 items 
emerged as distinct empirical factors (see Table 1). These were also validated by the conceptual 
judgment of a panel of experts' . By 1 996-97, the number of participating schools was increased from 
39 schools in Illinois to 401 schools in 16 states. Data for this study were drawn from the survey 
administered to a large number of teachers and students in middle grades across 5 years (1992-93 to 
1996-97). Only the teachers who teach middle grades (grades 6 to 8) in typical middle grade schools 
(6-8, 7-8, 5-8, 7-9, etc.) were selected 2 . In addition, only the classroom teachers who teach “core” 
subject areas were selected as we found that instructional practices in non-core subject areas were 
quite different from those in core subjects 3 . Table 2 shows the characteristics of the teachers included 
in the study across years. 

Research for the study was conducted with three different statistical techniques: Factor 
analytic study, reliability study and correlational study. First, the exploratory factor analyses were 
conducted to identify conceptually meaningful dimensions of CEPS. Factor structures can vary due to 



sampling fluctuation and differences in factor analytic procedures. Therefore, considerable attention 
was given to the stability or robustness of CIPS factor structures over time. A series of confirmatory 
factor analyses was also conducted to see whether the proposed measurement model adequately fit 
the sample data (Byrne, 1994). Second, the reliability analyses were conducted to examine the 
internal consistency in teacher responses using coefficient alpha statistics (Cronbach, 1951). 
Coefficient alpha was selected because the items on the survey were scored polytomously. Last, 
correlations between teacher report and student report of instructional practices, and correlations 
between teacher report of instructional practices and student achievement 4 were examined to provide 
evidence of criterion-related validity of instructional practice measures. 

Results 

Factor analyses 

Oblique rotation of 7, 8, and 9 factors was undertaken for 1992-93 data. The eight-factor 
solution afforded the psychologically most meaningful interpretation of the empirical dimensions of 
the instructional practice construct. Eight- factor solution was applied to the data for later years to see 
whether the factor structures were stable across years. The factor loadings in 1992-93, 1994-95 and 
1996-97 are presented in Table 3 5 . Most of the factors were clean and readily identifiable. Although 
some items were loaded on multiple factors in later years, extracted factors were, in general, 
congruent across years. When the items were loaded on multiple factors, they were classified on a 
conceptual basis judged by the panel of experts. The items “Students provide feedback and 
comments on each other’s work”, “Altemative/authentic assessments are employed to evaluate 
student learning” and “Self-paced learning materials are utilized” are examples of those cases. 

Maximum likelihood confirmatory factor analyses (CFA) were employed to examine the 
goodness of fit of the eight CIPS scale model (Bollen, 1989; Hoyle, 1995). EQS for Windows 
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Release 5.7b (Bentler & Wu, 1995) was utilized to estimate the parameters of models consisting of 
the eight factors. Table 4 presents the goodness of fit indexes across 5 years, x 2 goodness-of-fit 
statistics (Joreskog, 1 969), Nonnormed F it Index (NNFI: Bentler & Bonnet, 1 980) and Comparative 
Fit Index (CFI: Bentler, 1 990) are reported 6 . A baseline model was employed in which each item was 
allowed to load on only one of the eight hypothesized latent constructs. These latent variables were 
allowed to covary, and residual covariances were fixed to zero. While the hypothesized 8-factor 
model (Model 1) did not fit the data adequately (CFI for the model ranged from .83 to .86 across 5 
years), the fit indexes were sufficiently high to suggest that modification would yield models with 
acceptable fit (Bentler & Bonnet, 1980). As some items are closely related to each other, and some 
items are loaded with multiple factors, we decided to allow several items to be inter-correlated. 
Based on the modification indexes provided by the stepwise multivariate LaGranger Multiplier test, 
the final model (Model 2) with 23 correlated residuals and 6 cross-loaded items was tested. The 
model with correlated residuals attained a level of fit that is generally considered to be acceptable 
(Bentler & Bonnet, 1980): NNFI was about .90 and CFI was about .91 across 5 years. 

Reliability 

Having identified robust and distinctive dimension of instructional practices, we examined 
the internal consistency of the factorially derived CIPS scales. Table 5 shows the Cronbach’s 
coefficient alpha statistics for 8 CIPS sub-scales across 5 years. All scales showed moderate to high 
level of internal consistency across years. All scales except Integration and Coverage of Health 
Topics and Mastery Based Assessment and Student Recognition had coefficient alpha ranged from .8 
to .91. Mastery Based Assessment and Student Recognition had slightly smaller alpha than .8 
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(ranged from .76 to .79) whereas Integration and Coverage of Health Topics had alpha ranged from 
.58 to .62. This scale has only three items combined whereas other scales have 7 to 8 items. This, in 
part, explains the relatively low reliability of this scale. When the total instructional practice scale 
based on all 56 items was used, the reliability coefficient was very high across all years 
(approximately .95). Moreover, levels of internal consistency did not differ substantially between 
boys and girls, among grade levels, and among students from diverse racial and ethnic and socio- 
economic backgrounds. 

Correlational analyses 

We also examined the extent to which teacher responses on their instructional practices were 
congruent with student responses. Teachers and students in grades 6, 7, or 8 in middle schools were 
selected and their responses were aggregated at the school-level. Results on the correlations between 
teacher and student reports of classroom practices are reported in Table 6. Similar items were asked 
of both teachers and students on two CIPS scales: Small Group Active Instruction, and Integration 
and Interdisciplinary Practices. Table 6 shows a significant relationship between teacher and student 
report of the instructional practices (p< .01). Correlations between teacher and student reports of 
Small Group Instruction ranged from .52 to .66, whereas correlations between teacher and student 
reports of Integration ranged from .61 to .76 across 5 years. When teachers reported they more 
frequently utilized the instructional practices of integration and small group activities, students also 
reported they engaged in more activities, indicating the validity of teachers’ self-reported 
instructional practices. 

We also examined the correlation between teacher report of their instructional practices and 
student achievement. In order to make the relationship more comparable, we examined the 
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correlation between mathematics teachers’ report of instructional practices and students’ 
mathematics achievement. Achievement data were available for Illinois schools only from 1993-94 
to 1995-96. Table 7 shows significant and positive correlations between instructional practices of 
mathematics teachers and student math achievement, especially for Small Group Instruction and 
Integration and Interdisciplinary Practices. They ranged from .25 to .60 for Small Group Instruction 
and .38 to .87 for Integration. 

Summaries and discussions 

This study examines the reliability and validity of self-reported survey data on instructional 
practices. It is based on nation-wide survey with more than 25,000 teachers in over 1,000 schools 
across 5 years. Although self reported survey data might not capture the quality of interaction 
between teachers and students, our study shows that survey data provide a fairly accurate description 
of how often teachers use various instructional practices that are consistent with the 
recommendations of several reform initiatives. There was consistent and solid agreement between 
what teachers reported and what students perceived in terms of their classroom activities. CIP scales 
were positively related to student achievement in mathematics. Instead of using individual indicators, 
we found that grouped items, measuring the same underlying characteristics, provide more reliable 
measures of instructional practices. We proposed 8 dimensions of quality instruction. They measure 
distinctive constructs of instructional practices both empirically and conceptually. Their factor 
structures were stable over 5 years and the hypothesized model fit the data well. As policymakers 
focus more and more on assessing instructional trends, it is not plausible to rely on in-depth studies 
of a small number of classrooms. Survey data will provide the most cost-effective way of measuring 
national trends in instruction. 
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